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ABSTRACT 
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than to broader categories of items. A comparison of the essential 
uni dimens ional i ty structure across cultures is also performed. 
Results indicate that in the Japanese and U.S. data form the Second 
International Mathematics Study (SIMS), there are several subscales 
in SIMS mathematics tests, and that individual scores should be 
calibrated on each subscale rather than on a total score in the SIMS 
test. Essential uni dimens i onal ity estimates for the four tests were 
not the same in the two countries, calling into question the 
equivalence of dimensionality of the four tests. Either items on the 
test are more unidimensional in Japan, or the ability spaces among 
Japanese students are more homogeneous than for U.S. students. Eleven 
tables are included. (Contains 10 references.) (Author/SLD) 



* ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft i< ft ft *, c ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft 

* Reproductions supplied by EDRS are the best that can be made * 

* from the original document. * 

ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft 



Effects of Mathematics Test Content Specificity on Essential Dimensionality in U.S. and Japan Data 



U S DEPARTMENT OF EDUCATION 

Oftice of Iducationn flesench »n<J improvement 

EDUCATIONAL RESOURCES INFORMATION 
y CENTER lERlCl 

{CTrus documeni has been reproduced as 
f©ceiv«3 Irom the person or organization 
originating it 

O Minor changes have been made to improve 
reproduction Quality 

• Points oi view or opinions stated m this docu 
ment do not necessarily rep'esent official 
OERI position or policy 



"PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC)." 



Yu-Chung Lawrence Wang and Dennis Hocevar 



University of Southern California 



Paper presented at annual meeting of the American Educational Research Association, April 1994, New 
Orleans, LA. 



The authors wish to thank Ian Westbury for providing access to the SIMS data. This research was partially 
supported by the Educational Policy Fellowship during the first year of the study and by Chiang Ching-Kuo 
Research Scholarship in the second year of the study. 



BEST COPY AVAILABLE 



Effects of Mathematics Test Content Specificity on Essential Dimensionality in U.S. and Japan Data 

Abstract 

Stout (1987, 1990) has provided a weaker essential dimensionality assumption and argued that the 
1RT model fits when Lord's assumption of unidimensionality is replaced by the assumption of essential 
dimensionality. The major goal of this study is to apply Stout's essential dimensionality statistic and the 
corresponding computer program (i.e., DIMTEST) to a hierarchical level mathematics achievement data set, 
and, based on the result, to determine the extent to which the unidimensional assumption can be accurately 
applied to mathematics achievement data. The study also ascertains if the unidimensionality assumption is 
more tenable when applied to specific subsets of items (e.g., arithmetic, algebra, geometry and 
measurement) rather than broader categories of items (e.g., eighth-grade general mathematic achievement). 
A comparison of the essential unidimensionality structures across cultures (i.e., Japan vs. U.S.) also is 
performed. 

Results indicate that the assessment of essential dimensionality in the Second International 
Mathematics Study (SIMS) Japan and U.S. data implies that there are several subscales in SIMS 
mathematic tests, and that individual scores should be calibrated on each of the mathematics subscales rather 
than on a total score in the SIMS Test. The essential dimensionality estimates for the four tests in the 
U.S. and Japan study were not the same. This result questions the equivalence of the dimensionality for the 
four SIMS tests which share 40 common items and 35 randomly assigned unique items. According to the 
results of every possible comparison of the essential dimensionality between the U.S. and Japan, tests in 
the Japan study tend to be more essentially unidimensional than their U.S. counterparts. This result 
implies either the items on the test are more unidimensional in Japan than in the U.S., or (hat the ability 
spaces among Japanese students are more homogeneous than the U.S. students. Many restrictions in u:ing 
DIMTEST on real data were encountered and discussed at the end of the study. 



Item response theory has been widely used in bias or DIF study across cultures due to unique 
invariance property of item parameters. The invariancc property of IRT holds only when its two major 
assumptions, unidimensionality and local dependence, hold. The unidimensionality assumption (Lord, 
1980) assumes that every individual taking the test uses the same single cognitive skill to respond to the 
whole set of items. 

Lord's unidimensionality assumption has been criticized as unrealistic and lacking an appropriate 
statistical test (Traub, 1983). Humphcry (1982) warned that adimcnsionally narrowed test would weaken 
the validity of the test. Stout (1987, 1990) has provided a weaker essential dimensionality assumption and 
argued that the IRT model fits when Lord's assumption of unidimensionality is replaced by the assumption 
of essential dimensionality. The essential dimensionality assumption assumes that multidimensional item 
characteristics and examinee ability are suitable to unidimensional IRT as long as there is a dominant trait. 
Stout also provided a statistical test which has been refined by Nandakumar to assess whether or not 
essential dimensionality holds for a set of items. One should refer to Stout (1987, 1990) and Nandakumar 
(1993) and Nandakumar and Stout (1993) for a detailed definition of essential dimensionality. Though 
Stout and his colleagues have done many Monte-Carlo studies on the essential dimensionality measures 
using simulated data, few investigators have used a real test. This study should fill this gap by using four 
different SIMS mathematics achievement tests. 

The major goal of this study is to apply Stout's essential dimensionality statistic and 
corresponding computer program (i.e., DIMTEST) to a hierarchical level mathematics achievement data set, 
and, based on the result, to determine the extent to which the unidimensional assumption can be accurately 
applied to mathematics achievement data. The study also ascertains if the unidimensionality assumption is 
more tenable when applied to specific subsets of items (e.g., arithmetic, algebra, geometry and 
measurement) rather than broader categories of items (e.g., eighth-grade general mathematic achievement). 
A comparison of the essential unidimensionality structures across cultures (i.e., Japan vs. U.S.) also is 
performed. 

The results of this study have important implications to the area of mathematics achievement 
testing because contemporary test developers routinely use IRT methods to develop and refine tests. This 
study concentrates on the unidimensionality assumption since studies have found that violating the 
unidimensionality assumption produces a substantial lack of item parameter invariance (Ackerman, 1991; 
Oshima & Miller, 1990). 
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Methodology 

Daia. 

The data for this study were taken from parts of the Second International Mathematics Study 
(SIMS) sponsored by the International Association for the Evaluation of Educational Achievement (1985). 
During 1980 and 1982, the Second IE A International Mathematics Stu^v researchers collected data on 
mathematics curricula, teaching practices, and achievement from samples of students, teachers, and schools 
in 20 countries. SIMS was conducted at two levels: (1) Population A in which students were (typically) in 
the national grade in which the modal age was 13; and (2) Population B where students were taking the 
most advanced pre-university mathematics course(s) offered in their school systems. Only the population A 
data set from the United States and Japan were used in this study. 
Subjects . 

Population A is defined as all eighth graders of mainstream public and non-public schools. 
Mentally, physically, emotionally, or learning disabled students who were placed in special education 
classes were excluded. Stratification variables which were used in the SIMS study were: School type (i.e., 
public vs. private), regional standard metropolitan statistical area (SMSA), location (i.e., cast-central vs. 
south-west) and metropolitan status (i.e., city, suburb, other or district outside SMSA). 
Instruments . 

There were two major SIMS study designs: longitudinal and cross-sectional. Both U.S. and Japan 
were in the longitudinal study design, but in the longitudinal design, Japan used the cross-sectional 
instrument. For the U.S. instrument, an eighth-grade mathematics achievement test that consisted of a 
total of 180 items which were selected from the international bank of 196 items divided into a 40-itcm core 
subtest and four 35-item "rotated forms" was used. For the Japanese instrument, the first 40 items of the 
international bank of 196 items were assigned to the core test, the next 34 items were assigned to form A, 
and so on. It is important to mention here that the test construction strategies for the cross sectional and 
longitudinal studies were significantly discrepant which introduces a certain degree of nonequivalence to 
begin with. Table 1 shows the difference between the cross-sectional and longitudinal designs. The 40 core 
items (different for the U.S. and Japan) were administered to all examinees, and rotated forms were randomly 
assigned to students (approximately one- fourth of the students taking each form). In other words, each 
student was administered the core and one rotated form for a total of 74 or 75 of the 196 items in the pool. 
Tables 2.1 to 2.4. shows the number of items in each content area for both the U.S. and Japan. Subcontent 
areas with an asterisk (*) were selected for the essential unidimensionalily examination reported herein. 



Table 1 Test Construction Strategies for Both the SIMS Cross-sectional and Longitudinal Designs. 



Cross Sectional Design (Japan) 


Longitudinal Design (U.S.) 


Core A B C D 
40 34 34 34 34 


Core A B C D 
40 35 35 35 35 


The first 40 items of the international bank of 196 
items were assigned to the core form, the next 34 
Hems were assigned to form A, and so on. The 
total number of items in this study design was 
176. 


Items in each test form (i.e., core form and form A 
etc.) were selected from an international bank of 
196 items and the total number of items in this 
study design was 180. 



Table 2. 1. Content Table of Arithmetic Items for Japan and the U.S. 

Ja pan UA 



Arith. 


FORM 


FORM 


FORM 


FORM 


FORM 


FORM 


FORM 


FORM 


Subarea 


A 


B 


C 


D 


A 


B 


C 


D 


001 


4 


4 


4 


4 


3 


2 


2 


3 


002* 


4 


4 


3 


3 


6 


6 


6 


6 


003* 


4 


5 




5 


6 


7 


6 


5 


004* 


4 


3 




4 


11 


10 


10 


11 


005 


0 


1 




1 


0 


1 


1 


1 


006 


2 


1 


2 


1 


1 


1 


1 


0 


008 


1 


2 


1 


2 


1 


1 


1 


0 


009 


0 


0 


0 


1 


0 


0 


0 


1 


# of total 


19 


20 


19 


22 


28 


28 


27 


27 


Table 2.2 Content Table of Algebra Items for Japan and the U.S. 






Japan 






U,S 






Algba. 


FORM 


FORM 


FORM 


FORM 


FORM 


FORM 


FORM 


FORM 


subarea 


A 


B 


C 


D 


A 


B 


C 


D 


101* 


4 


3 


4 


2 


2 


3 


2 


4 


102 


2 


2 


1 


2 


1 


0 


0 


0 


103 


0 


1 


1 


0 


1 


1 


0 


0 


104* 


3 


3 


4 


4 


3 


3 


4 


2 


105 


1 


1 


0 


1 


1 


0 


1 


0 


106* 


3 


4 


4 


3 


5 


5 


5 


6 


107 


2 


1 


1 


2 


1 


2 


2 


1 


110 


2 


1 


2 


2 


0 


0 


0 


1 


# of total 


17 


16 


18 


16 


14 


14 


14 


14 



6 
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Table 2.3 Content Table for Geometry Items for Japan and the U.S. 







J. man 






U.S. 








FORM 


FORM FORM 


FORM 


FORM 


FORM FORM 


FORM 




A 


B C 


D 


A 


B 


C 


D 


901* 
ZUi 




j j 






2 


2 


i 

i 


ZUZ 




9 4 
z *+ 




-j 


4 


3 


H 


ZUj 


z 


J J 


z 


i 
i 


0 


1 


1 
1 


ZLrf 


Z 


9 9 
z z 


z 


i 

i 


2 


2 


9 


ZU J 


0 


1 0 


n 
w 


1 


0 


0 


o 


206 


i 


2 1 


1 


0 


2 


1 


0 


207* 


4 


3 4 


2 


3 


2 


3 


3 


208 


1 


1 1 


1 


1 


2 


1 


1 


209 


2 


2 1 


2 


1 


1 


1 


1 


212 


1 


1 1 


2 


2 


1 


2 


1 


215 


1 


1 0 


2 


1 


1 


0 


2 


# of total 


20 


21 20 


19 


17 


17 


16 


16 


Table 2.4 Content Table of Measurement Items for Japan and the U.S. 






Japan 






U.S. 






Measure. 


FORM 


FORM FORM 


FORM 


FORM 


FORM FORM 


FORM 


subarea 


A 


B C 


D 


A 


B 


C 


D 


401* 


2 


3 2 


2 


3 


2 


3 


3 


402* 


2 


2 2 


3 


3 


4 


3 


5 


403 


1 


0 1 


0 


2 


2 


1 


1 


404* 


5 


5 4 


5 


4 


4 


6 


4 


# of total 


10 


10 9 


10 


12 


12 


13 


13 



This achievement test covered five major content areas: arithmetic, algebra, geometry, statistics 
and measurement. Only statistics items were excluded from the present study due to a small pool of items. 
There were several subcontent categories under each major content area. For example, there were eight 
subcontent areas within the arithmetic area: Natural Numbers (001), Common Fractions (002), Decimal 
Fractions (003), Ratios, Proportions, and Percent (004), Number Theory (005), Power and Exponents 
(006), Square Roots (008), and Dimensional Analysis (009). As shown in Tables 2.1 to 2.4, three 
subcontent areas, which contained a sufficient number of items, were selected from each of the four major 
content areas. They were Common Fractions (002), Decimal Fractions (003) and Ratio, Proportion and 
Percent (004) in arithmetic; Integers (101), Formulas and Algebraic Expressions (104), and Equations and 
Inequations (106) in algebra; Classification of Plane Figures (201), Properties of Plane Figures (202), and 
Coordinates (207) in geometry; and Standard Units (401 ), Estimation (402), and Determination of measures 
(404) in measurement. Table 3 displays one sample item for each area" in this study. Omits and "not 
reaches" were treated as wrong answers. Readers who are interested in the complete content of ail SIMS 
items should reft/ to 'Technical Report I" of SIMS (Chang & Ruzicka, 1985). 
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Table 3. S amplejte ms with the International Item Code in Twelve SIMS Areas. 
Math. Area Sample Item 



Common Fraction. 
(002) 

Decimal Fraction. 
(003) 

Ratio, Proportion. 
(004) 

Integer 
(101) 

Formulas 
(104) 

Equation Inequation. 
(106) 

Classify Panle 
Figure. (201) 



Properties of Panel 
Figures (202) 



Coordinates 
(207) 

Standard Unit 
(401) 



Estimation 
(402) 

Determination of 

measures 

(404) 



YS003. 2/5+3/8 is equal to (a) 5/13 (b) 5/40 (c) 6/40 

(d) 16/15 (e) 31/40 

YS005. 0.40 x 6.38 is equa» to (a) 0.2552 (b) 2.452 (c) 2.552 (d) 
24.52 (e) 25.52 

YS008. In a school of 800 pupils, 300 are boys. The ratio of the number 
of boys to the number of girls is (a) 3:8 (b)5:8 (c) 3: 1 1 (d) 5:3 

(e) 3:5. 

YS012. (-2) x (-3) is equal to (a) -6 (b)-5 (c)-l (d) 5 (e)6 



YS015. Simplify: 5x + 3y + 2x - 4y (a) 7x + 7y (b) 8x - 2y 
(c) 6xy (d) 7x - y (e) 7x + y 

YS017. If P = LW and if P = 12 and L = 3, then W is equal to (a) 3/4 
(b)3 (c)4 (d) 12 (e)36 

YS021. A quadrilateral MUST be a parallelogram if it has (a) one pair of 
adjacent sides equal (b) one pair of parallel sides (c) a diagonal as axis 
of symmetry (d) two adjanct angles equal (e) two pairs of parallel sides. 

YS023*. The length of the circumference of the circle with center O is 
24, and the length of arc RS is 4. What is the measures in degrees of the 
central angle ROS? (a) 24 (b) 30 (c)45 (d)60 (c)90 

YS028*. What are the coordinates of point P? (a) (-3, 4) 
(b)(-4,-3) (c)(3,4) (d)(4,-3) (e)(-4,3) 

YS036. Which of the following is the most likely to be nearest to the 
weight of a normal man ? (a) 8.5 kg (b) 85 kg (c) 185 kg (d) 850 kg 
(e) 1850 kg 

YS038*. On the above scale the reading indicated by the arrow is between 

(a) 51 and 52 (b) 57 and 58 (c) 60 and 62 (d) 62 and 64 (c) 64 and 66 

YS037*. The total area of the two triangles is (a) 6X8 cm 2 

(b) 6X8/2cm 2 (c)lOX6/2cm 2 (d) 16X12/2cm 2 
(e) 20X12 /2 cm 2 



Note. * denotes the accompanying figure for an item was omitted in this table. 
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Four test forms of eighth grade mathematic tests (form A, B, C and D) were investigated in both 
the U.S. and Japan study and were labled as test 1 to test 4 for U.S. tests A, B, C and D and 5 to 8 for 
Japan tests A, B, C and D, respectively (see Table 4). As mentioned earlier, every SIMS mathematics 
achievement test covers five mathematic areas: arithmetic, algebra, geometry, measurement, and statistics. 
But statistic subtests were excluded due to their small number of items aiv* low reliability. Every subtest 
was relabeled for this study and these lablcs arc displayed in Table 5. The first number of the subtest 
denotes the resource of the subtest, and the extension number 1, 2, 3, and 4 denotes arithmetics, algebra, 
geometry, and measurement. 

The reliability coefficients of all tests for both countries were computed and displayed in Table 6. 
The standardized coefficient a ranged from .89 to .65 in the U.S. study and from .83 to .64 in the Japan 
study. Measurement tests were the least reliable among the four SIMS areas. Three measurement content 
tests out of a total of twelve tests had reliabilities lower than 0.70 in the U.S. study and one measurement 
test out of twelve tests had a reliability lower than 0.70 in Japan. 



Table 4 Labels of 8 SIMS Tests. 



Test 


Items 


N 


Description 


1 


75 


1652 


U.S. test. A 


2 


75 


1610 


U.S. test B 


3 


75 


1668 


U.S. test C 


4 


75 


1619 


U.S. test D 


5 


74 


1986 


Japan test A 


6 


74 


1982 


Japan test B 


7 


74 


1965 


Japan test C 


8 


74 


1851 


Japan test D 



9 



8 



Table 5, Labels of Subtests in SIMS Study. 



Test 


Items 


N 


Description 


U.S. tests 








1.1 


28 


1652 


Subtest of test 1 (Arithmetic) 


1.2 


14 


1652 


Subtest of test 1 (Algebra) 


1.3 


17 


1652 


Subtest of test 1 (Geometry) 


1.4 


12 


1652 


Subtest of test 1 (Measurement) 


2.1 


28 


1610 


Subtest of test 2 (Arithmetic) 


2.2 


14 


1610 


Subtest of test 2 (Algebra) 


2.3 


17 


1610 


Subtest of test 2 (Geometry) 


2.4 


12 


1610 


Subtest of test 2 (Measurement) 


3.1 


27 


1668 


Subtest of test 3 (Arithmetic) 


3.2 


14 


1668 


Subtest of test 3 (Algebra) 


3.3 


16 


1668 


Subtest of test 3 (Geometry) 


3.4 


13 


1668 


Subtest of test 3 (Measurement) 


4.1 


27 


1619 


Subicst of test 4 (Arilhmetic) 


4.2 


14 


1619 


Subtest of test 4 (Alecbral 


4.3 


16 


1619 


Subtest of test 4 (Geometry) 


4.4 


13 


1619 


Subtest of test 4 (Measurement) 


Japan tests 








5.1 


19 


1986 


Subtest of test 5 (Arithmetic) 


5.2 


17 


1986 


Subtest of test 5 (Algebra) 


5.3 


20 


1986 


Subtest of test 5 (Geometry) 


5.4 


10 


1986 


Subtest of test 5 (Measurement) 


6.1 


20 


1982 


Subtest of test 6 (Arithmetic) 


6.2 


16 


1982 


Subtest of test 6 (Algebra) 


6.3 


21 


1982 


Subtest of test 6 (Geometry) 


6.4 


10 


1982 


Subtest of test 6 (Measurement) 


7.1 


19 


1965 


Subtest of test 7 (Arithmetic) 


7.2 


18 


1965 


Subtest of test 7 (Algebra) 


7.3 


20 


1965 


Subtest of test 7 (Geometry) 


7.4 


9 


1965 


Subtest of test 7 (Measurement) 


8.1 


22 


1851 


Subtest of test 8 (Arithmetic) 


8.2 


16 


1851 


Subtest of test 8 (Algebra) 


8.3 


19 


1851 


Subtest of test 8 (Geometry) 


8.4 


10 


1851 


Subtest of test 8 (Measurement) 
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Tabic 6. The Reliability Coefficients of the Four Major Mathematical Scales in Both the U.S. and Japan 
Data in SIMS , = ^^ ====== 





Test Form 


FORM A 


FORM B 


FORM C 


FORM D 




Test Content 


KR-20 


ST.a 


KR-20 


ST a 


KR-20 


ST a 


KR-20 


ST.a 




Arithmetic 


.88 


.88 


.88 


.88 


.89 


.89 


.86 


.86 


u. s. 


Algebra 


.80 


.80 


.78 


.78 


.77 


.78 


.73 


.73 




Geometry 


.72 


.72 


.75 


.74 


.74 


.74 


.75 


.75 




Measure 


.68 


.67 


.68 


.66 


.66 


.65 


.72 


.71 




Arithmetic 


.79 


.79 


.77 


.76 


.78 


.78 


.78 


.78 


Japan 


Algebra 


.82 


.82 


.77 


.78 


.81 


.82 


.83 


.83 


Geometry 


.81 


.81 


.75 


.74 


.76 


.75 


.74 


.74 




Measure 


.74 


.74 


.72 


.74 


.64 


.65 


.71 


.72 



Assessing Essential Dimensionality . 

The essential test of unidimensionality (Stout, 1987, 1990), which is available in a computer 
program DIMTEST (Stout, Douglas, Junker, & Roussos, 1992), was applied in the present study. A brief 
summary reference of the steps of Stout's procedure for assessing unidimensionality is described below: 

1. The resource test items are divided into three subtests, that is, assessment subtest 1 and 2 and a 
partitioning subtest. The items in the ATI subtest (i.e., the assessment subtest 1) are selected to be as 
unidimcnsional as possible and to be dimensionality distinct from the remaining items. Selecting ATI can 
be done by expert opinion or by using factor analysis to choose the items with ihc highest same-sign 
loadings on the second extracted factor. Items in assessment subtest 2 or AT2 are selected by the 
DIMTEST computer program from the rest of the test so that their difficulty level is similar to the ATI 
items. (The function of AT2 is to correct for prc-asymptotic statistical bias in Stout's statistic T.) Items 
in the PT subtest (i.e., partitioning subtest which are the remaining items after selecting ATI and AT2 
items) are used for the purpose of grouping examinees based on their PT score. 

2. Examinees arc assigned to k different subgroups according to their PT score. Examinees who answer all 
PT items correctly or incorrectly are excluded. A PT subgroup with too few examinees (less than 5 in this 
study) is deleted. 

3. The variance estimate S\ 2 for each PT subgroup on ATI is computed. 

4. The unidimcnsional variance estimate 5 A u d,k 2 lor cach PT subgroup on ATI is computed. See 
Nandakumar (1993) for computational formulas. 

5. The two variance estimates arc used to obtain the Tl statistic: 

where Sk is Ihe standard error of estimate. 
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6. Compute a similar statistic Tb on AT2. 

7. The essential dimensionality statistics are performed with a null hypothesis that the degree of essential 
unidimcnsionality of the whole test is equal to 1 (i.e., the test is undimcnsional) in contrast to the 
alternative hypothesis which assumes the degree of the essential dimensionality is greater than 1 (i.e., the 
test is not unidimcnsional). Stout's unidimcnsionality test statistic T is given by 

V2 l2 ] 

The null hypothesis, which assumes unidimcnsionality (or H 0 : d£= 1 ), is rejected when T is statistically 
significant greater than an upper percentile of the standard normal distribution. 

In this study, the essential dimensionality of eight SIMS test forms (test 1 '.o test 8) were assessed 
four times by treating all items within the same major content area (i.e., arithmetic, algebra, geometry and 
measurement) as ATI items. AT2 and PT items were selected from the remaining three major content 
areas. In other words, the degree of the essential dimensionality for a single test was estimated four times 
using four different sets of ATI items. For example, to test the essential dimensionality of test 1 using 28 
arithmetic items (or test 1.1) as ATI items, the remaining algebra, geometry, statistic and measurement 
items were treated as AT2 and PT items (a total of 47 items). Similarly, the essential dimensionality of 
test 1 can be assessed using 14 algebra items (or test 1.2) as ATI items and the AT2 and PT items were 
selected from the remaining items. 

Along the same lines, in this study, the essential dimensionality of four mathematic contents (i.e., 
arithmetic, algebra, geometry, and measurement) were assessed using three possible homogenous ATI 
items in a particular content. For example, in this study* to assess the essential dimensionality of 
arithmetic in test 1 (or test 1.1), three possible groups of ATI items (i.e., 3 common fraction, 6 decimal 
fraction and 6 ratio proportion items) were used separately as ATI item pool and the remaining uithmctic 
items in test 1 were treated as AT2 and PT items. The effects of the size of ATI items also was 
investigated in the study. 

DIMTEST can be run by a user friendly interactive subprogram called irtgo. Users should type 
irtgo under the directory containing DIMTEST and specify all the parameters correctly. Two essential 
dimensionality statistics are printed on the screen and saved in the last section of the output file. Readers 
should refer to the manual of DIMTEST for derailed information on running DIMTEST. 

Results 

The top section of Table 7 shows the summary results for Stout's essential dimensionality 
statistics provided by the computer program DIMTEST for four SIMS test forms across the U.S. and Japan 
datasets. It was found that the degree of the essential dimensionality of four SIMS test forms ( 1, 2, 3, and 
4) vary when different ATI items were used. For instance, in the U.S. data, test 1 was identified as 

11 



ERIC 



12 



essentially unidimensional when arithmetic and measurement items were used as ATI items but it was 
identified as essentially multidimensional when algebra and geometry items were used as ATI. This 
association between the degree of the essential dimensionality statistics and the characteristics of ATI items 
was consistently found in three other tests (2, 3, and 4). For example, test 2 was flagged as essentially 
unidimensional when arithmetic or geometry items were used as ATI, but flagged as multidimensional 
when algebra and measurement items were the ATI items. This table also shows that none of the SIMS 
test was identified as essentially unidimensional across four different ATI situations. In conclusion, SIMS 
tests 1, 2, 3, and 4 held the essential dimensionality assumption 2, 2, 2, 3 times, each out of a total of four 
trials, respectively in the U.S. data. 

A similar pattern of essential dimensionality for the Japan data was found and is displayed in the 
bottom section of Table 7. Specifically, an association between the degree of the essential dimensionality 
and the choice of ATI items was found, and again, none of the four SIMS tests in the Japan study were 
consistently detected as unidimensional across four ATI cases. However, the SIMS Japan tests presented a 
slightly higher degree of essential unidimensionality than the U.S. tests. That is, for all possible pairs of 
comparisons, the degree of the essential dimensionality for the Japan tests tended to be higher than in the 
U.S. tests. For instance, for three tests (5, 6, and 8) using arithmetic ATI items, Japanese data showed 
better essential dimensionality results than the U.S. data. 

It is also noteworthy that the arithmetic items in test 3 in the U. S. data, and the measurement 
items in test 5 and 6 in the Japan data were found to be too easy and not appropriate as ATI items. Also, it 
was found that SIMS tests were more likely to be identified as multidimensional when the ATI items were 
algebra items in both samples. This result may imply that the dimensionality structure of the algebra 
items is significantly different from the other three mathematic areas. Finally, all eight SIMS tests were 
identified as multidimensional, at least once, in this analysis. 

The four SIMS U.S, general tests which have 40 common items and 35 randomly assigned unique 
items did not have the same essential dimensionality estimates. For instance, test 1 was flagged as 
moderately unidimensional with arithmetic ATI items while test 2 and test 4 were identified as boardline 
unidimensional. Similar inconsistencies were found in the Japan data. 

Table 8 provides a summary of Stout's essential dimensionality statistics for the four different 
arithmetic tests using three arithmetic subcontents as ATI in both countries. The top section of this table 
shows first, the U.S. natural number items in tests 1.1, 2.1 and 3.1 arc not appropriate as ATI items and 
second, arithmetic tests arc flagged as essentially unidimensional in eight out of nine trails. This result 
indicates that the degree of essential dimensionality improved significantly when the arithmetic subcontent 
area was considered as a test rather than the whole 75-item mathematics achievement test. 

The bottom section of Table 8 shows Stout's essential dimensionality statistics for the four 
arithmetic tests in the Japan study. Four subcontent areas in arithmetic were used as ATI. Similar to the 
U.S. data, the degree of the essential unidimensionality for a content specific test is better than a general 
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test. For instance, only two of fourteen arithmetic tests (tests 5.1 and 7.1), violated the essential 
unidimensionality assumption in the Japan study. 

Another finding from this analysis is that the essential dimensionality structure of the tests are not 
the same for the U.S. and Japan. This result may imply that the cognitive ability of the two groups is 
different, which results in the discrepancy in the interaction between respondents and SIMS items across the 
two distinctive cultures. However, since the two studies used only partially common items, the discrepancy 
of the essential dimensionality structure is confounded. Particularly because of the initial analyses (Table 
7), the authors suspect the stability of Stout's essential dimensionality statistics. In other words, the 
discrepancies of the degree of the essential dimensionality across different test forms and cultures may reflect 
the fact that Stout's two essential dimensionality statistics are not reliable. Further study is needed to 
investigate the reliability of the essential dimensionality statistics and the validity of replacing Lord's 
unidimensionality assumption with Stout's essential dimensionality assumption. 

The ratio proportion items in the U.S. were not used here due to their inappropriately large item 
numbers. Stout (1992) suggested the best range for the size of the ATI items is greater than one fourth but 
less than one third of total items for the best essential dimensionality estimates. There were 11, 10, 10, 1 1 
ratio items in forms A, B, C, D respectively out of a total of 28, 28, 27, 27 arithmetic items which 
violated Stout's rule. As a result, the use of ratio proportion items for the essential dimensionality tests 
were skipped in the U.S. data. 

Similar results were found in algebra, geometry and measurement using three different ATI 
assignments within each content area. Results are presented in Table 9, Table 10 and Table 1 1 , 
respectively. In Table 9, only test 2.2 using integer ATI items for the U.S. data and test 5.2 and 8.2 using 
equation ATI items for the Japan data indicated a lack of essential dimensionality. This result indicates that 
the degree of the essential unidimensionality for the algebra content improves significantly in comparison 
to the four achievement tests taken as a whole (i.e., Table 7). The essential dimensionality statistics for 
many geometry and measurement tests were not calculated due to an inappropriate number of ATI items. 
However, only one geometry test (4.3) in the U.S. and one in the Japan data (6.3) were found to be 
essentially multidimensional (notably both used "coordinates'* as ATI items). The essential dimensionality 
statistics for many geometry and measurement contents were not calculated due to an inappropriate number 
of ATI items which is one restriction in applying DIMTEST to real-life data. Nevertheless, the trend is for 
better support for essential dimensionality when contents rather than the whole tests are analyzed. 
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Table 7. Essential Dimensionality Statistics for F SIMS General Tests. 





Target Test 


ATI 


S out's T 


P-value 


Refined T P-value 




#ATl/#Total 














1 


(28/75) 




.73 


.23 


.60 


.27 




2 


(28/75) 


Arithmetic 


1.36 


.09 


1.35 


.08 




3 


(27/75) 




AT 1 items failed the difficulty test 






4 


(27/75) 




1 SO 


07 


1 48 

1 .to 


.07 




1 


(14/75) 




3.75 


.00** 


4.22 


.00** 




2 


(14/75) 


Algebra 


4.54 


.00** 


4.96 


.00** 




3 


(14/75) 




4.58 


.00** 


5.09 


.00** 


U.S. 


4 


(14/75) 




1.01 


.16 


1.16 


.12 




1 


(17/75) 




2.15 


.02* 


2.32 


.01* 




2 


(17/75) 


Geometry 


1.58 


.06 


1.60 


.06 




3 


(16/75) 




1.50 


.07 


1.53 


.06 




4 


(16/75) 




3.00 


.00** 


3.41 


AAA* 




1 


(12/75) 




-.42 


.66 


-.50 


.69 




2 


(12/75) 


Measure. 


1.92 


.03* 


2.19 


r\ 1 * 
.01' 




3 


(13/75) 




.35 


.36 


.46 


.32 




4 


(13/75) 




.18 


.43 


.18 


.43 




5 


(19/74) 




.20 


.42 


.17 


.43 




6 


(20/74) 


Arithmetic 


-.59 


.72 


-.67 


.75 




7 


(19/74) 




' 2.28 


.01* 


2.58 






8 


(21/74) 




.81 


.20 


.99 


.16 




5 


(17/74) 




1.19 


.12 


1.11 


.13 




6 


(1674) 


Algebra 


.78 


.22 


.73 


.23 




7 


(18/74) 




1.54 


.06 


1.70 


.04* 




8 


(16/74) 




2.36 


.01** 


2.55 


.01** 


Japan 


















5 


(20/74) 




1.90 


.03* 


2.06 


.02* 




6 


(21/74) 


Geometry 


.13 


.45 


.08 


.47 




7 


(20/74) 




.79 


.22 


.90 


.18 




8 


(19/74) 




1.28 


.12 


1.24 


.11 




5 


(10/74) 




AT 1 items failed difficulty test. 






6 


(10/74) 


Measure. 


AT 1 items failed difficulty test. 






7 


(09/74) 




1.03 


.15 


1.19 


.12 




8 


(10/74) 




1.90 


.03* 


2.28 


.01* 
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Table 8. Essential Dimensionality Statistics for 8 Arithmetics Tests 





Target Test 


ATI 


Stout's T 


P-value 


Refined T P-value 




#ATl/#total 














1.1 


(3/28) 




AT 1 items fail the difficulty test. 






2.1 


(2/28) 


Natural 


Too few ATI items for DIMTEST. 






3.1 


(2/27) 


Numbers 


Too few ATI items for DIMTEST. 






4.1 


(3/27) 




-.26 


.60 


-.35 


.63 




1.1 


(6/28) 




.53 


.30 


.60 


.27 


u. s. 


2.1 


(6/28) 


Common 


.06 


.48 


.06 


.48 




3.1 


(6/27) 


Fractions 


1.36 


.09 


1.64 


.05* 




4.1 


(6/27) 




-.11 


.55 


-.19 


.58 




1.1 


(6/28) 




.78 


.22 


.83 


.20 




2.1 


(7/28) 


Decimal 


.44 


.33 


.56 


.29 




3.1 


(6/27) 


Fractions 


-.19 


.58 


-.25 


.60 




4.1 


(5/27) 




1.34 


.09 


1.63 


.05 




5.1 


(4/19) 




AT 1 


items failed the difficulty test 






6.1 


(4/20) 


Natural 


-.29 


.61 


-.31 


.62 




7.1 


(4/19) 


Numbers 


-1.17 


.88 


-1.31 


.90 




8.1 


(4/21) 




-1.83 


.97 


-2.26 


.99 




5.1 


(4/19) 




.50 


.31 


.52 


.30 




6.1 


(4/20) 


Common 


1.14 


.13 


1.36 


.09 




7.1 


(3/19) 


Fractions 


AT 1 


items failed the difficulty test 






8.1 


(3/21) 




1.19 


.12 


1.54 


.06 


Japan 


















5.1 


(4/19) 




-.68 


.75 


-.78 


.78 




6.1 


(5/20) 


Decimal 


-.66 


.75 


-.83 


.80 




7.1 


(5/19) 


Fractions 


.11 


.46 


.11 


.46 




8.1 


(5/21) 




1.41 


.08 


1.59 


.06 




5.1 


(4/19) 




1.87 


.03* 


2.09 


.02* 




6.1 


(3/20) 


Ratio 


.22 


.41 


.28 


.39 




7.1 


(3/19) 


Proportions 


1.54 


.06 


1.94 


.03* 




8.1 


(4/21) 




.94 


.17 


.11 


.14 
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T abic 9. Essential Dimensionality Statistics for 8 Algebra Test s 

Target Tfest ATI Stout's T P-value Refined T P- value 
#ATl/#Total 





1.2 


(3/14) 




.76 


.22 


.96 


.17 




2.2 


(3/14) 


Integers 


1.82 


.03* 


2.18 


.01* 




3.2 


(3/14) 




.30 


.38 


.35 


.36 




4.2 


(4/14) 




.96 


.17 


1.00 


.16 




1.2 


(3/14) 




.29 


.39 


.37 


.36 


U.S. 


2.2 


(3/14) 


Formulas 


-.01 


.50 


-.07 


.53 




3.2 


(4/14) 




.16 


.44 


.14 


.44 




4.2 


(3/14) 




-.49 


.69 


-.69 


.75 




1.2 


(5/14) 




-.83 


.80 


-.76 


.78 




2.2 


(5/14) 


Equations 


-.16 


.56 


-.11 


.54 




3.2 


(5/14) 




.38 


.35 


.44 


.33 




4.2 


(6/14) 




-.78 


.78 


-.65 


.74 




5.2 


(4/17) 




-.21 


.58 


-.38 


.65 




6.2 


(3/16) 


Integer 


-1.90 


.97 


-2.36 


.99 




7.2 


(4/18) 




-1.60 


.95 


-1.93 


.97 




8.2 


(2/16) 




Too few ATI items for DIMTEST. 






5.2 


(3/17) 




-2.42 


.99 


-2.92 


.99 


Japan 


6.2 


(3/16) 


Formulas 


.25 


• .40 


.31 


.38 




7.2 


(4/18) 




.55 


.29 


.72 


.23 




8.2 


(4/16) 




.58 


.28 


.63 


.26 




5.1 


(3/17) 




2.81 


.00** 


3.47 


.00** 




6.2 


(4/16) 


Equations 


-.83 


.80 


-.91 


.82 




7.2 


(4/18) 


.07 


.47 


.11 


.46 




8.2 


(3/16) 




1.89 


.03* 


2.30 


.01* 
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Table 10. Essential Dimensionality Statistics for 8 Geometry Tests 
Target Test ATI Stout's T P-value 
#ATl/#Total 



U.S. 



Refined T P-value 



1.3 (3/17) 

2.3 (2/17) 

3.3 (2/16) 

4.3 (1/16) 

1.3 (3/17) 

2.3 (4/17) 

3.3 (3/16) 

4.3 (4/16) 

1.3 (3/17) 

2.3 (2/17) 

3.3 (3/16) 

4.3 (3/16) 



Classificat. 
of Plane 
Figure 



Properties 
of Plane 
Figure 



Coordinates 



1.06 • .14 1.09 .14 

Too few AT 1 items for DIMTEST. 
Too few ATI items for DIMTEST. 
Too few ATI items for DIMTEST. 



.85 
-.80 
.38 
.37 



.80 
.79 
.35 
.35 



-1.17 
-.93 
.53 
.45 



1.29 .10 1.60 
Too few ATI items for DIMTEST. 
.13 .45 .23 

2.16 .02* 2.50 



.82 
.30 
.32 

.05 

.41 
.01** 



Japan 



5.3 (3/20) 

6.3 (3/21) 

7.3 (3/20) 

8.3 (3/19) 

5.3 (3/20) 

6.3 (2/21) 

7.3 (4/20) 

8.3 (2/19) 

5.3 (4/20) 

6.3 (3/21) 

7.3 (4/20) 

8.3 (2/19) 



Classificat. 
of Plane 
Figure 



Properties 
of Plane 
Figure 



Coordinates 



■1.25 
.16 
-.46 
-.68 



.89 
.44 
.68 
.75 



-1.60 
.26 
-.60 
-.89 



.55 .29 .64 
Too few ATI items for DIMTEST. 
-.04 .52 -.01 
Too few ATI items for DIMTEST. 

.92 .18 1.12 
3.05 .00** 3.87 
-.63 .74 -.74 
Too few ATI items for DIMTEST. 



.95 
.40 
.73 
.81 

.26 

.50 



.13 

.00** 

.77 
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Table 11. Essential Dimensionality Statistics for 8 Measurement Tests 





Target Test 


ATI 


Stout's T P-value Refined T P-value 




#ATl/#Total 








1.4 


(3/12) 




2 39 01** 2 97 .00** 




2.4 


(2/12) 


0 UU 1 LkU u 


Too few ATI items for DIMTEST 








Units 


1.09 .14 1.22 .11 




A A 
4.4 






2.01 .02* 2.49 .01** 




1.4 


(3/12) 




- 49 69 - ^4 70 

i*t7 f VJ7 ■ ~>~ • 1 v 


T T <J 
U .0. 


2.4 


(4/12) 


Pi ct im it irtnc 
rvallliluUUHa 


-^9 7? - 7S 77 




1 A 
J. 4 






ATI items failed the difficulty test. 




A A 
4.4 


P/ 1 ~>) 




Too many AT 1 liims for DIMTEST 




1.4 


C4/12) 




-2.32 .99 -2.52 .99 




2.4 


(4/12) 


LA^l^l 1 1 1 11 Id I 


-1.80 .96 -1.99 .98 




3.4 


(6/13) 


of Measure 


Too many AT 1 items for DIMTEST 




4.4 


(4/13) 




-1.21 .89 -1.28 .90 




5.4 


(2/10) 




Too few ATI items for DIMTEST, 




6.4 


(3/10) 


Standard 


-1.94 .97 -2.42 .99 




7.4 


(2/09) 


Units 


Too few ATI items for DIMTEST. 




8.4 


(2/10) 




Too few ATI items for DIMTEST. 




5.4 


(2/10) 




Too few ATI items for DIMTEST. 


Japan 


6.4 


(2/10) 


Estimations 


Too few ATI items for DIMTEST. 


7.4 


(2/09) 




Too few ATI items for DIMTEST. 




8.4 


(3/10) 




.40 .34 .37 .35 




5.4 


(5/10) 




Too many AT 1 items lor DIMTEST 




6.4 


(5/10) 


Determinat 


Too many AT 1 items for DIMTEST 




7.4 


(4/09) 


of Measure 


Too many AT 1 items for DIMTEST 




8.4 


(5/10) 




Too many AT 1 items for DIMTEST 



1.9 
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Conclusions and Discussion 

The assessment of essential dimensionality in the SIMS Japan and U.S. data implies that there are 
several subscales in SIMS mathematic tests, and that individual scores should be calibrated on each of the 
mathematical subscales rather than on a total score in the Second International Mathematical Achievement 
TesL In other words, scores reported separately based on subscales (such as arithmetic, algebra, geometry, 
and measurement) are more appropriate than a single general scale (such as, general eighth grade mathematic 
achievement). Furthermore, when unidimensional IRT is applied to calibrate items, items within the same 
content area should be calibrated on the same scale. More importantly in reference to the use of IRT, it is 
readily apparent that the routine use of unidimensional IRT methods to calibrate mathematics achievement 
items deserves further scrutiny. On the type of tests to which IRT methods are sometimes applied (e.g. a 
general eighth grade mathematics achievement test), a substantial lack of the unidimensionality assumption 
was uncovered in the present analysis. 

Secondly, in the analysis of the whole test, it has been found that the degree of the essential 
dimensionality depended on the choice of ATI items and on the particular form selected. Also, effects of 
the size ot the ATI item pool as well as the sample size might be considered. 

The essential dimensionality estimates for the four general tests in each study (U.S. and Japan) 
were not the same. This result questions the equivalence of the dimensionality for the four U.S. tests and 
the four Japan tests. In other words, comparing scores across four U.S. tests and four Japan tests may be 
inappropriate when the dimensionality of the tests varies significantly. 

According to the results of every possible comparison of the essential dimensionality between the 
U.S. and Japan, tests in the Japan study tend to be more essentially unidimensional than their U.S. 
counterparts. This result implies either the items on the lest are more unidimensional in Japan than in the 
U.S., or that the ability spaces among Japanese students are more homogeneous than the U.S. students. 
However, one limitation to these conclusions is that the U.S. and Japan tests were not identical to begin 
with. 

Many restrictions in using DIMTEST on real data were encountered. The first restriction is related 
to the unclear definition of ATI items. According to the analyses in this study, the essential 
dimensionality estimates are highly associated with the selection of ATI. Stout has suggested the 
DIMTEST users select ATI items to be as dimensionally homogeneous as possible and to be as 
dimensionally distinct from other items as possible. Hence, the degree of essential dimensionality for a 
specific group of items may vary when the dimensionality of the ATI items changes. For example, in the 
present study, the degree of the essential dimensionality for a particular lest was found to be different when 
the ATI items were changed. Table 7 shows test 1 in the U.S. as essentially unidimensional when 
arithmetic items were treated as the ATI items. When the algebra or geometry items were ATI, the same / 
test was flagged as multidimensional. This discrepancy may have resulted from the variation of the degree 
of the essential dimensionality across the four SIMS mathematical contents. 
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The second restriction in using DIMTEST on a real test is related to the requirement on the size 
and difficulty of ATI items. Stout suggested the most appropriate ATI size is one fourth of the total items 
which is fairly hard to satisfy in a real achievement test with less than three subcategories or with a 
subcategory having very few items. An equal distribution of items across three subcategories of a test may 
induce the problem of having too many ATI items if items within the same subcategory are selected as 
ATI items. DIMTEST cannot be performed in the situation above. 

The last defect of DIMTEST results arc caused by its sample dependent characteristic. The 
essential dimensionality statistics themselves, as discussed earlier, measure the interaction between a group 
of subjects and items using original item responses for the analysis. The degree of essential dimensionality 
therefore may change when the degree of the homogeneity of the respondents 1 cognitive ability space 
changes. Hence, to validly generalize the result of the essential dimensionality for a test across samples, 
the homogeneity of the cognitive space across groups of examinees needs to be confirmed. 
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